-
Notifications
You must be signed in to change notification settings - Fork 1.6k
tls: Make session resume key shared across credentials builders creds #2709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me although I have not been able to test it yet myself. I understand that @dkropachev was able to test that it fixes the right problem (sessions were not being reused on a multi-shard applications, but were reused on a single shard), but @dkropachev - were you able to test that this specific patch fixes the problem on multi-shard runs?
@elcallio I couldn't understand where exactly in the code do we have a cross-shard cache of these session keys, if this is done in a safe multi-threaded way, and when do we ever drop old session keys from this cache, but I'm sure you thought about it already and it works well :-)
Not a cross shard cache, but a pre-computed key in the credentials builder. This is applied to all credentials created with this builder (or a copy of it).
Session keys are re-generated every time you set the session resume mode, or every time a reloadable credentials do a reload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm apart from a nit.
5b4d608
to
b174593
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
void tls::credentials_builder::set_session_resume_mode(session_resume_mode m) { | ||
_session_resume_mode = m; | ||
if (m != session_resume_mode::NONE) { | ||
gnutls_datum key; | ||
gtls_chk(gnutls_session_ticket_key_generate(&key)); | ||
_session_resume_key.assign(key.data, key.data + key.size); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea, If we want to tls renegotiation to work even when node is being restarted we need _session_resume_key
to be persisted either in the config or as a file or any other way.
Which means we will need another API that could feed pregenerated _session_resume_key
to the credentials builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is unnecessary. If you have node restarts so frequent it impacts TLS session management, you have a bigger problem.
Should the need arise, get/set API:s for this can be added, but for now I'd say it is overreach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is unnecessary. If you have node restarts so frequent it impacts TLS session management, you have a bigger problem. Should the need arise, get/set API:s for this can be added, but for now I'd say it is overreach.
It is not about frequency, it is about of amount of clients that are trying to reconnect at the same time.
When client is reconnecting it establishes TLS connection first and the go thru CQL initialization process, CPU pressure of having thousands of clients reconnecting at the same time could be so high that it impedes CQL initialization process, as result client breaks connection and reconnects again.
Though, on second reconnect, it should have TLS ticket learned, so it should go smoother.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that a restart should be infrequent. Connections can use session resume across the nodes uptime or the ticket expiry, whichever comes first (lets hope the latter). But also remember that storing the session key across restarts would more or less require them to be clean restarts (because we do want to refresh the key at least on cert reload etc), which, in the case of crashes, makes it even less useful.
Also remember that using the same key for to long is a security concern, and already the cross-session usage here is somewhat dubious (but since we can treat cross-shard instances as equivalent to a "normal" server single session ok).
size = 0; | ||
gnutls_datum(void* d = nullptr, size_t s = 0) { | ||
data = reinterpret_cast<unsigned char*>(d); | ||
size = s; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is internal, it's still a dangerous constructor (esp. the non-explicit conversion from any pointer).
Please change to std::span<std::byte>.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better solution is to make the constructor only take a size and do the alloc internally, as the memory must be allocated by gnutls_malloc. Better to enforce.
src/net/tls.cc
Outdated
@@ -436,12 +436,20 @@ class tls::certificate_credentials::impl: public gnutlsobj { | |||
client_auth get_client_auth() const { | |||
return _client_auth; | |||
} | |||
void set_session_resume_mode(session_resume_mode m) { | |||
void set_session_resume_mode(session_resume_mode m, const std::vector<uint8_t>& key = {}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here a span is more flexible than a vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a huge overkill for an internal call. But sure.
b174593
to
0065577
Compare
Fixes scylladb#2708 Session resume key (if enabled) was only generated/kept on credentials level. This means that access patterns where we create a new such object per shard, using same builder, would have different session keys, and thus not be able to resume each others sessions. This moves key generation to builder level and re-uses this for all generated server credentials. Note that a reload of any file (in reloadable) will re-create the key (though it should become invalid anyway, but...). Add tests for this, and a hefty warning that session keys are shared, should anyone try to re-use a builder for disparate credentials. v2: * Make gnutls_datum more container-like (i.e. encapsulated sized alloc)
0065577
to
ed9c2a0
Compare
ping? |
Fixes #2708
Session resume key (if enabled) was only generated/kept on credentials level. This means that access patterns where we create a new such object per shard, using same builder, would have different session keys, and thus not be able to resume each others sessions.
This moves key generation to builder level and re-uses this for all generated server credentials. Note that a reload of any file (in reloadable) will re-create the key (though it should become invalid anyway, but...).
Add tests for this, and a hefty warning that session keys are shared, should anyone try to re-use a builder for disparate credentials.